17 research outputs found
Carving model-free inference
In many large-scale experiments, the investigator begins with pilot data to
look for promising findings. As fresh data becomes available at a later point
of time, or from a different source, she is left with the question of how to
use the full data to infer for the selected findings. Compensating for the
overoptimism from selection, carving permits a reuse of pilot data for valid
inference. The principles of carving are quite appealing in practice: instead
of throwing away the pilot samples, carving simply discards the information
consumed at the time of selection. However, the theoretical justification for
carving is strongly tied to parametric models, an example being the ubiquitous
gaussian model. In this paper we develop asymptotic guarantees to substantiate
the use of carving beyond gaussian generating models. In simulations and in an
application on gene expression data, we find that carving delivers valid and
tight confidence intervals in model-free settings.Comment: 50 pages, 2 figures, 7 Table
Approximate selective inference via maximum likelihood
This article considers a conditional approach to selective inference via
approximate maximum likelihood for data described by Gaussian models. There are
two important considerations in adopting a post-selection inferential
perspective. While one of them concerns the effective use of information in
data, the other aspect deals with the computational cost of adjusting for
selection. Our approximate proposal serves both these purposes-- (i) exploits
the use of randomness for efficient utilization of left-over information from
selection; (ii) enables us to bypass potentially expensive MCMC sampling from
conditional distributions. At the core of our method is the solution to a
convex optimization problem which assumes a separable form across multiple
selection queries. This allows us to address the problem of tractable and
efficient inference in many practical scenarios, where more than one learning
query is conducted to define and perhaps redefine models and their
corresponding parameters. Through an in-depth analysis, we illustrate the
potential of our proposal and provide extensive comparisons with other
post-selective schemes in both randomized and non-randomized paradigms of
inference
Selective Inference with Distributed Data
As datasets grow larger, they are often distributed across multiple machines
that compute in parallel and communicate with a central machine through short
messages. In this paper, we focus on sparse regression and propose a new
procedure for conducting selective inference with distributed data. Although
many distributed procedures exist for point estimation in the sparse setting,
few options are available for estimating uncertainties or conducting hypothesis
tests based on the estimated sparsity. We solve a generalized linear regression
on each machine, which then communicates a selected set of predictors to the
central machine. The central machine uses these selected predictors to form a
generalized linear model (GLM). To conduct inference in the selected GLM, our
proposed procedure bases approximately-valid selective inference on an
asymptotic likelihood. The proposal seeks only aggregated information, in
relatively few dimensions, from each machine which is merged at the central
machine for selective inference. By reusing low-dimensional summary statistics
from local machines, our procedure achieves higher power while keeping the
communication cost low. This method is also applicable as a solution to the
notorious p-value lottery problem that arises when model selection is repeated
on random splits of data
Exact Selective Inference with Randomization
We introduce a pivot for exact selective inference with randomization. Not
only does our pivot lead to exact inference in Gaussian regression models, but
it is also available in closed form. We reduce the problem of exact selective
inference to a bivariate truncated Gaussian distribution. By doing so, we give
up some power that is achieved with approximate inference in Panigrahi and
Taylor (2022). Yet we always produce narrower confidence intervals than a
closely related data-splitting procedure. For popular instances of Gaussian
regression, this price -- in terms of power -- in exchange for exact selective
inference is demonstrated in simulated experiments and in an HIV drug
resistance analysis.Comment: 42 pages, 8 Figure